[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

ajrasane · 2025-11-18T18:14:53Z

What does this PR do?

Type of change:
New Feature

Overview:

Created an abstract parent class for ONNXQuantExporter
Created child classes for individual precisions
Implemented the INT4QuantExporter
Removed quantize_weights_to_int4
Added a method to quantize weights of the ONNX model to low precision

Testing

python torch_quant_to_onnx.py --quantize_mode=int4_awq \
	--onnx_save_path=<onnx_path> \

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: No

codecov · 2025-11-18T19:52:46Z

Codecov Report

❌ Patch coverage is 16.66667% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.61%. Comparing base (7a36ccc) to head (9a45ddb).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/_deploy/utils/torch_onnx.py	11.76%	15 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #575      +/-   ##
==========================================
+ Coverage   74.57%   74.61%   +0.04%     
==========================================
  Files         183      183              
  Lines       18412    18546     +134     
==========================================
+ Hits        13730    13839     +109     
- Misses       4682     4707      +25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gcunhase · 2025-11-19T01:36:27Z

If this PR is just for INT4, and NVFP4 and MXFP8 are WIP, can you please update the title accordingly? Thanks!

modelopt/onnx/export/int4_exporter.py

modelopt/onnx/export/quant_exporter.py

Signed-off-by: ajrasane <[email protected]>

galagam

Looks good after the last commit. Approved.

cjluo-nv · 2025-11-26T07:35:24Z

modelopt/onnx/export/int4_exporter.py

+                next_node = cast_child_nodes[0]
+
+            # Store transpose permutation if present
+            if next_node.op_type == "Transpose":


We will need to call this after the cast node is processed

modelopt/torch/_deploy/utils/torch_onnx.py

modelopt/onnx/export/int4_exporter.py

Signed-off-by: ajrasane <[email protected]>

ajrasane self-assigned this Nov 18, 2025

ajrasane requested review from a team as code owners November 18, 2025 18:14

ajrasane requested a review from i-riyad November 18, 2025 18:14

ajrasane changed the title ~~[OMNIML-2244] Create the ONNX quantization exporter~~ [OMNIML-2244] Implement the ONNX quantization exporter for INT4 Nov 19, 2025

ajrasane requested a review from galagam November 19, 2025 11:03

ajrasane force-pushed the ajrasane/mixed_precision branch from e829998 to 83028fa Compare November 20, 2025 20:34

galagam reviewed Nov 23, 2025

View reviewed changes

modelopt/onnx/export/int4_exporter.py Show resolved Hide resolved

modelopt/onnx/export/quant_exporter.py Outdated Show resolved Hide resolved

gcunhase approved these changes Nov 24, 2025

View reviewed changes

ajrasane added 2 commits November 24, 2025 23:17

[OMNIML-2244] Create the ONNXQuantExporter

5e994b4

Signed-off-by: ajrasane <[email protected]>

Remove Casts for specific optypes

1fc6e83

Signed-off-by: ajrasane <[email protected]>

ajrasane force-pushed the ajrasane/mixed_precision branch from 83028fa to 530de9b Compare November 24, 2025 23:18

Update tests

a4c3e31

Signed-off-by: ajrasane <[email protected]>

ajrasane force-pushed the ajrasane/mixed_precision branch from 530de9b to a4c3e31 Compare November 24, 2025 23:24

Create a function to pre-process the ONNX model

88567b1

Signed-off-by: ajrasane <[email protected]>

ajrasane force-pushed the ajrasane/mixed_precision branch from f4e6f50 to 88567b1 Compare November 26, 2025 03:18

galagam approved these changes Nov 26, 2025

View reviewed changes

cjluo-nv reviewed Nov 26, 2025

View reviewed changes

modelopt/torch/_deploy/utils/torch_onnx.py Outdated Show resolved Hide resolved

cjluo-nv approved these changes Nov 26, 2025

View reviewed changes

cjluo-nv reviewed Nov 26, 2025

View reviewed changes

modelopt/onnx/export/int4_exporter.py Show resolved Hide resolved

ajrasane added 2 commits November 26, 2025 19:05

Update quantize_weights

6e605b8

Signed-off-by: ajrasane <[email protected]>

Move exporters to separate files

9a45ddb

Signed-off-by: ajrasane <[email protected]>

ajrasane enabled auto-merge (squash) November 26, 2025 19:44

ajrasane merged commit 0a4f0a8 into main Nov 26, 2025
27 checks passed

ajrasane deleted the ajrasane/mixed_precision branch November 26, 2025 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

Uh oh!

ajrasane commented Nov 18, 2025

Uh oh!

codecov bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

gcunhase commented Nov 19, 2025

Uh oh!

Uh oh!

Uh oh!

galagam left a comment

Uh oh!

cjluo-nv Nov 26, 2025

Uh oh!

ajrasane Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

[OMNIML-2244] Implement the ONNX quantization exporter for INT4 #575

Uh oh!

Conversation

ajrasane commented Nov 18, 2025

What does this PR do?

Testing

Before your PR is "Ready for review"

Uh oh!

codecov bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gcunhase commented Nov 19, 2025

Uh oh!

Uh oh!

Uh oh!

galagam left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

ajrasane Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Nov 18, 2025 •

edited

Loading